Corpus Annotation of Macro Discourse Structures

نویسندگان

  • Lydia-Mai Ho-Dac
  • Cécile Fabre
  • Marie-Paule Péry-Woodley
  • Josette Rebeyrolle
چکیده

We present our discourse annotation project, ANNODIS, which aims to make available a diversified French corpus annotated with discourse information, along with a set of tools for annotation and corpus exploitation. An original aspect of the project is that it combines two theoretically and methodologically different points of view on discourse: bottom-up and top-down. In the bottom-up perspective, basic constituents are identified and linked via discourse relations. In a complementary manner, the top-down approach starts from the text as a whole and focuses on the identification of configurations of cues signalling higher-level text segments, in an attempt to address the interplay of continuity and discontinuity within discourse. The focus of this paper is the annotation scheme used in the top-down approach, which revolves around enumerative structures. These structures, which are of particular interest to our project because of their ability to occur in nested configurations and at all levels of granularity (from within a sentence to across text sections), are the discourse object chosen to “bootstrap” our approach. We describe the different stages involved: corpus selection, pre-processing and “marking” techniques, and the specific interface facilities, designed to make it possible for coders to navigate and scan the text in order to identify relevant spans at different granularity levels. keywords: Corpus annotation, discourse organisation, text segmentation, macro discourse structures, computational linguistics I. A MACRO APPROACH TO DISCOURSE ORGANISATION The work presented here is part of a larger project, the ANNODIS project, the aims of which is to create an annotated corpus of French written texts for the study of discourse organisation. Discourse organisation is difficult to study with corpus linguistics methods because of the lack of

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Discourse-Annotated Corpus of Conjoined VPs

English grammars indicate a variety of relations holding between conjoined VPs. VPs conjoined by and evince such senses as Result, Temporal Sequence and Concession. Although all these senses are ones associated with discourse relations, conjoined VPs have not been fully included in discourse annotation. Because of the value of discourse-annotated corpora for developing approaches to automated s...

متن کامل

STANCE AND ENGAGEMENT DISCOURSE MARKERS IN JOURNAL’S “AUTHOR GUIDELINES”

Over the past decade, there has been an increasing interest in the study of interactional metadiscourse markers in different contexts. However, not much research has been conducted about the discourse of journal author guidelines, especially the use of meta-discourse markers in this genre. Therefore, this corpus-based study had three main aims: 1) to delve deep into the types, frequencies and f...

متن کامل

An empirical resource for discovering cognitive principles of discourse organisation: the ANNODIS corpus

We describe the Annodis corpus of discourse structures for French. The corpus joins two perspectives on discourse on a variety of textual genres: a bottom-up approach and a top-down approach. The bottom-up view builds incrementally a structure from elementary discourse units, while the top-down view focuses on the selective annotation of multi-level discourse structures. The corpus is composed ...

متن کامل

Expressivity and comparison of models of discourse structure

Several discourse annotated corpora now exist for NLP. But they use different, not easily comparable annotation schemes: are the structures these schemes describe incompatible, incomparable, or do they share interpretations? In this paper, we relate three types of discourse annotation used in corpora or discourse parsing: (i) RST, (ii) SDRT, and (iii) dependency tree structures. We offer a comm...

متن کامل

A corpus-driven approach to discourse organisation: from cues to complex markers

This paper reports on an experiment implementing a data-intensive approach to discourse organisation. Its focus is on enumerative structures envisaged as a type of textual pattern in a sequentiality-oriented approach to discourse. On the basis of a large-scale annotation exercise calling upon automatic feature mark-up alongside manual annotation, we explore a method to identify complex discours...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009